AITopics | evidence source

Collaborating Authors

evidence source

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis

Huang, Yuhang, Lin, Zekai, Zhong, Fan, Liu, Lei

arXiv.org Artificial IntelligenceNov-4-2025

Explanations for AI models in high-stakes domains like medicine often lack verifiability, which can hinder trust. To address this, we propose an interactive agent that produces explanations through an auditable sequence of actions. The agent learns a policy to strategically seek external visual evidence to support its diagnostic reasoning. This policy is optimized using reinforcement learning, resulting in a model that is both efficient and generalizable. Our experiments show that this action-based reasoning process significantly improves calibrated accuracy, reducing the Brier score by 18\% compared to a non-interactive baseline. To validate the faithfulness of the agent's explanations, we introduce a causal intervention method. By masking the visual evidence the agent chooses to use, we observe a measurable degradation in its performance ($Δ$Brier=+0.029), confirming that the evidence is integral to its decision-making process. Our work provides a practical framework for building AI systems with verifiable and faithful reasoning capabilities.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.01425

Country: Asia > China (0.15)

Genre: Research Report (0.65)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-checking of LLM Responses

Chen, Tong, Wang, Zimu, Miao, Yiyi, Luo, Haoran, Sun, Yuanfei, Wang, Wei, Jiang, Zhengyong, Sen, Procheta, Su, Jionglong

arXiv.org Artificial IntelligenceSep-23-2025

Medical fact-checking has become increasingly critical as more individuals seek medical information online. However, existing datasets predominantly focus on human-generated content, leaving the verification of content generated by large language models (LLMs) relatively unexplored. To address this gap, we introduce MedFact, the first evidence-based Chinese medical fact-checking dataset of LLM-generated medical content. It consists of 1,321 questions and 7,409 claims, mirroring the complexities of real-world medical scenarios. We conduct comprehensive experiments in both in-context learning (ICL) and fine-tuning settings, showcasing the capability and challenges of current LLMs on this task, accompanied by an in-depth error analysis to point out key directions for future research. Our dataset is publicly available at https://github.com/AshleyChenNLP/MedFact.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.17436

Country:

Europe (1.00)
North America > United States (0.93)
Asia (0.68)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Hepatology (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(8 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Expediting data extraction using a large language model (LLM) and scoping review protocol: a methodological study within a complex scoping review

Stewart-Evans, James, Wilson, Emma, Langley, Tessa, Prayle, Andrew, Hands, Angela, Exley, Karen, Leonardi-Bee, Jo

arXiv.org Artificial IntelligenceJul-10-2025

The data extraction stages of reviews are resource-intensive, and researchers may seek to expediate data extraction using online (large language models) LLMs and review protocols. Claude 3.5 Sonnet was used to trial two approaches that used a review protocol to prompt data extraction from 10 evidence sources included in a case study scoping review. A protocol-based approach was also used to review extracted data. Limited performance evaluation was undertaken which found high accuracy for the two extraction approaches (83.3% and 100%) when extracting simple, well-defined citation details; accuracy was lower (9.6% and 15.8%) when extracting more complex, subjective data items. Considering all data items, both approaches had precision >90% but low recall (<25%) and F1 scores (<40%). The context of a complex scoping review, open response types and methodological approach likely impacted performance due to missed and misattributed data. LLM feedback considered the baseline extraction accurate and suggested minor amendments: four of 15 (26.7%) to citation details and 8 of 38 (21.1%) to key findings data items were considered to potentially add value. However, when repeating the process with a dataset featuring deliberate errors, only 2 of 39 (5%) errors were detected. Review-protocol-based methods used for expediency require more robust performance evaluation across a range of LLMs and review contexts with comparison to conventional prompt engineering approaches. We recommend researchers evaluate and report LLM performance if using them similarly to conduct data extraction or review extracted data. LLM feedback contributed to protocol adaptation and may assist future review protocol drafting.

extraction, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.06623

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Introducing Answered with Evidence -- a framework for evaluating whether LLM responses to biomedical questions are founded in evidence

Baldwin, Julian D, Dinh, Christina, Mukerji, Arjun, Sanghavi, Neil, Gombar, Saurabh

arXiv.org Artificial IntelligenceJul-8-2025

The growing use of large language models (LLMs) for biomedical question answering raises concerns about the accuracy and evidentiary support of their responses. To address this, we present Answered with Evidence, a framework for evaluating whether LLM-generated answers are grounded in scientific literature. We analyzed thousands of physician-submitted questions using a comparative pipeline that included: (1) Alexandria, fka the Atropos Evidence Library, a retrieval-augmented generation (RAG) system based on novel observational studies, and (2) two PubMed-based retrieval-augmented systems (System and Perplexity). We found that PubMed-based systems provided evidence-supported answers for approximately 44% of questions, while the novel evidence source did so for about 50%. Combined, these sources enabled reliable answers to over 70% of biomedical queries. As LLMs become increasingly capable of summarizing scientific content, maximizing their value will require systems that can accurately retrieve both published and custom-generated evidence--or generate such evidence in real time.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.02975

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Health Care Providers & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Evaluating Evidential Reliability In Pattern Recognition Based On Intuitionistic Fuzzy Sets

Xu, Juntao, Zhan, Tianxiang, Deng, Yong

arXiv.org Artificial IntelligenceOct-30-2024

Determining the reliability of evidence sources is a crucial topic in Dempster-Shafer theory (DST). Previous approaches have addressed high conflicts between evidence sources using discounting methods, but these methods may not ensure the high efficiency of classification models. In this paper, we consider the combination of DS theory and Intuitionistic Fuzzy Sets (IFS) and propose an algorithm for quantifying the reliability of evidence sources, called Fuzzy Reliability Index (FRI). The FRI algorithm is based on decision quantification rules derived from IFS, defining the contribution of different BPAs to correct decisions and deriving the evidential reliability from these contributions. The proposed method effectively enhances the rationality of reliability estimation for evidence sources, making it particularly suitable for classification decision problems in complex scenarios. Subsequent comparisons with DST-based algorithms and classical machine learning algorithms demonstrate the superiority and generalizability of the FRI algorithm. The FRI algorithm provides a new perspective for future decision probability conversion and reliability analysis of evidence sources.

artificial intelligence, evidence source, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.00848

Country:

Asia > China > Sichuan Province > Chengdu (0.04)
North America > United States (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.86)

Add feedback

Learning Improved Representations by Transferring Incomplete Evidence Across Heterogeneous Tasks

Davvetas, Athanasios, Klampanos, Iraklis A.

arXiv.org Artificial IntelligenceDec-22-2019

Acquiring ground truth labels for unlabelled data can be a costly procedure, since it often requires manual labour that is error-prone. Consequently, the available amount of labelled data is increasingly reduced due to the limitations of manual data labelling. It is possible to increase the amount of labelled data samples by performing automated labelling or crowd-sourcing the annotation procedure. However, they often introduce noise or uncertainty in the labelset, that leads to decreased performance of supervised deep learning methods. On the other hand, weak supervision methods remain robust during noisy labelsets or can be effective even with low amounts of labelled data. In this paper we evaluate the effectiveness of a representation learning method that uses external categorical evidence called "Evidence Transfer", against low amount of corresponding evidence termed as incomplete evidence. Evidence transfer is a robust solution against external unknown categorical evidence that can introduce noise or uncertainty. In our experimental evaluation, evidence transfer proves to be effective and robust against different levels of incompleteness, for two types of incomplete evidence.

artificial intelligence, machine learning, real evidence, (16 more...)

arXiv.org Artificial Intelligence

1912.1049

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Multi-Source Fusion Operations in Subjective Logic

van der Heijden, Rens Wouter, Kopp, Henning, Kargl, Frank

arXiv.org Artificial IntelligenceMay-3-2018

The purpose of multi-source fusion is to combine information from more than two evidence sources, or subjective opinions from multiple actors. For subjective logic, a number of different fusion operators have been proposed, each matching a fusion scenario with different assumptions. However, not all of these operators are associative, and therefore multi-source fusion is not well-defined for these settings. In this paper, we address this challenge, and define multi-source fusion for weighted belief fusion (WBF) and consensus & compromise fusion (CCF). For WBF, we show the definition to be equivalent to the intuitive formulation under the bijective mapping between subjective logic and Dirichlet evidence PDFs. For CCF, since there is no independent generalization, we show that the resulting multi-source fusion produces valid opinions, and explain why our generalization is sound. For completeness, we also provide corrections to previous results for averaging and cumulative belief fusion (ABF and CBF), as well as belief constraint fusion (BCF), which is an extension of Dempster's rule. With our generalizations of fusion operators, fusing information from multiple sources is now well-defined for all different fusion types defined in subjective logic. This enables wider applicability of subjective logic in applications where multiple actors interact.

artificial intelligence, fusion, information fusion, (17 more...)

arXiv.org Artificial Intelligence

1805.01388

Country: Europe > Germany (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.49)

Add feedback